Variational and perturbative formulations of QM/MM free energy with mean-field 

embedding and its analytical gradients 
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Conventional quantum chemical solvation theories are based on the mean-field embedding ap- 
proximation. That is, the electronic wavefunction is calculated in the presence of the mean field 
of the environment. In this paper a direct quantum mechanical/molecular mechanical (QM/MM) 
analog of such a mean-field theory is formulated based on variational and perturbative frameworks. 
In the variational framework, an appropriate QM/MM free energy functional is defined and is min- 
imized in terms of the trial wavefunction that best approximates the true QM wavefunction in a 
statistically averaged sense. Analytical free energy gradient is obtained, which takes the form of 
the gradient of effective QM energy calculated in the averaged MM potential. In the perturbative 
framework, the above variational procedure is shown to be equivalent with the first-order expansion 
of the QM energy (in the exact free energy expression) about the self-consistent reference field. 
This helps understand the relation between the variational procedure and the exact QM/MM free 
energy as well as existing QM/MM theories. Based on this, several ways are discussed for evaluating 
non- mean-field effects (i.e., statistical fluctuations of the QM wavefunction) that are neglected in 
the mean-field calculation. As an illustration, the method is applied to an Sn2 Menshutkin reaction 
in water, NH3 -I- CH3CI NH3CHJ + Cl~, for which free energy profiles are obtained at the HF, 
MP2, B3LYP, and BH&HLYP levels by integrating the free energy gradient. Non-mean-field effects 
are evaluated to be < 0.5 kcal/mol using a Gaussian fiuctuation model for the environment, which 
suggests that those effects are rather small for the present reaction in water. 



I. INTRODUCTION 

A combined quantum mechanical/molecular mechani- 
cal (QM/MM) method is a powerful computational tool 
for studying chemical reactions in solution and in biolog- 
ical systemSfii^ It treats a chemically active part of the 
entire system with accurate QM methods while the rest 
of the system with MM force fields. The quality of a given 
QM/MM calculation depends primarily on the electronic 
structure method used. In the calculation of statistical 
properties like free energy, it is also important to ade- 
quately sample the relevant phase spaced However, this 
phase space sampling is very demanding computation- 
ally, because one needs to calculate QM electronic energy 
for a large number of statistical samples. One can ensure 
sufficient statistics by using fast semiempirical methods, 
but the resulting energetics may be less satisfactory than 
obtained with ab initio methods. On the other hand, 
highly correlated QM methods require too much compu- 
tational time and thus it becomes difficult to explore the 
phase space. 

A variety of approaches have been proposed in order 
to address the above trade-off between accuracy and ef- 
ficiency. One approach is a family of dual-level methods, 
in which a classical or semiempirical potential is used 
for statistical sampling and an accurate QM method for 
energetic correctionsj^'^'i'^i^ii^iiiii^ii^ii^ii^ Another ap- 
proach is to introduce some approximation to the QM- 
MM electrostatic interactions in order to reduce the num- 
ber of QM calculations. Our main interest in this paper 
is in the second approach above. In particular, we are 
concerned with the following three embedding schemes 
that prescribe how to calculate the QM wavefunction in 
the MM environment: 



(1) Gas-phase embedding scheme. This scheme to- 
tally neglects electrostatic perturbations of the MM en- 
vironment on the QM subsystem. The QM wavefunc- 
tion is calculated a priori in the gas phase, and the re- 
sulting charge density or partial charges are embedded 
into the MM environment. The reaction path is also de- 
termined by the gas-phase calculation. The free energy 
profile in solution is obtained via free energy perturba- 
tion (FEP) calculations along the pre-determined reac- 
tion path. This approach was first utilized by Jorgensen 
and co-workersi^iiiii2iii^ to study organic reactions in so- 
lution, and later by Kollman and co-workers^2i2iiS^ to 
study enzyme reactions. It should be noted however that 
this approach may not be appropriate for a certain class 
of enzyme reactions.— 

(2) Mean-field embedding scheme. This method cal- 
culates the QM wavefunction in the presence of the 
mean field of the environment. The averaged polar- 
ization (or distortion) of the QM wavefunction is thus 
correctly taken into account, while statistical fiuctu- 
ations of the QM wavefunction are totally neglected. 
Indeed, this mean-field approximation has been the 
basis of many conventional solvation models like the 
PCMSii^s g^^^ RiSM-SCF^i22i28^ methods. The mean- 
field idea has also been applied to the QM/MM frame- 
work by several authors. For example, Aguilar and 
co-workers^2i^i^i2^iM performed geometry optimization 
on an approximate QM/MM free energy surface using 
the averaged solvent electrostatic potential (ASEP)/MD 
method. More recently, the mean-field idea was ex- 
ploited by Warshel and co-workers^^ in order to accel- 
erate QM/MM calculation of solvation free energy. 

(3) Polarizable embedding scheme. This method first 
develops a polarizable model of the QM subsystem and 
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then embeds the resulting model into the MM environ- 
ment. The polarizable QM model can be developed, for 
example, by Taylor expanding the QM electronic energy 
up to second order ^i^liSS^^^^^ The QM/MM min- 
imum free energy path (MFEP) method by Yang and 
co-worker3^2ii2iiilii^ is based on this perturbative expan- 
sion idea, which has been applied to chemical reactions 
in solution and enzymes. Among the three embedding 
schemes above, the polarizable one is most accurate by 
allowing statistical fluctuations of the QM wavefunction. 

The first goal of this paper is to formulate the mean- 
field embedding scheme above by starting from a varia- 
tional principle for the QM/MM free energy (Sec. IIIBj) . 
As mentioned above, conventional solvation models are 
based on the mean-field embedding approximation. They 
often start with a variational principle for the following 
free energy^ii^di 

A(rQM) = (^I^qmI^-) + A^oiv[*]. (1) 

Minimization of A(rQM) in terms of \E' gives a nonlin- 
ear Schrodinger equation for \E' that is subject to the 
mean field of the environment. Very often, analytical 
gradient of free energy, 9^(rQM)/9rQM, is obtained by 
utilizing the variational nature of ^(rQM). Since those 
solvation models are quite successful in studying solution- 
phase chemistry, it is natural to try to extend them to the 
QM/MM framework. The main benefits of this extension 
are as follows. First, QM/MM models can describe inho- 
mogeneous as well as homogeneous environments on an 
equal theoretical footing. This makes it more straight- 
forward to compare the chemical reactivity of a system 
in different environments (e.g., in solution and enzymes). 
Second, since the mean-field QM wavefunction is calcu- 
lated only for a "batch" of MM configurations, the num- 
ber of QM calculations can be made significantly smaller 
than a direct QM/MM statistical calculation. As men- 
tioned above, such a mean-field QM/MM approach has 
been explored by several authors in the literature. For 
example, Angyan^ discussed such a method quite a few 
years ago based on a variational principle and linear- 
response approximation (LRA) . More recently, Kato and 
co-workers^^ developed the QM/MM LRFE method 
using a different type of variational/LRA idea and ap- 
plied it to chemical reactions in solution and enzymes. 
On the other hand, Aguilar and co-workers2i took a dif- 
ferent approach in the ASEP/MD method, where they 
did not invoke a variational principle nor LRA but rather 
approximated the free energy gradient as follows: 

dA{rQM) /9£:(rQM,rMM)\ ^ 

OTQM \ OTqm / OTqm 

(2) 

Here, £'(rQM,rMM) is the total energy of the QM/MM 
system and (• • • ) denotes the statistical average over MM 
degrees of freedom. Note that in Eq. Q "the average of 
the energy gradient" (the exact expression) is replaced by 
"the gradient of the averaged energy" in the spirit of the 
mean-field approximation. While Aguilar et al. demon- 
strated its accuracy via comparison with direct QM/MM 



calculations the detailed derivation of Eq. ^ was not 
provided and it was used as an ansatz. Therefore, our 
first aim in this paper is (i) to formulate a mean-field 
QM/MM framework by starting from a variational prin- 
ciple (but not invoking the LRA), (ii) obtain analytical 
gradient of the associated free energy, and (iii) discuss a 
possible rationale for the approximate gradient in Eq. 

The second goal of this paper is to understand the re- 
lation of the above variational/mean-field procedure with 
the underlying exact QM/MM free energy as well as ex- 
isting QM/MM theories fSecs. [nC] and HTD]) . First, it 
is shown that the above variational procedure is equiva- 
lent with the first-order expansion of effective QM en- 
ergy (in the exact free energy expression) about the 
self-consistent reference field. As mentioned above, the 
QM/MM-MFEP method'*"'''^ is based on this type of 
perturbative expansions. Therefore, it is interesting to 
compare the present approach with the QM/MM-MFEP 
method in detail (Appendix [C|. From this comparison it 
follows that the variational procedure is essentially equiv- 
alent with Model 3 of the QM/MM-MFEP method with 
charge response kernel x neglected. Note however that 
the full version of Model 3 includes that response kernel x 
and thus it is more accurate by describing statistical fiuc- 
tuations of the QM wavefunction. Therefore, in Sec. Ill Dl 
we also discuss several possible ways for evaluating such 
non-mean-field effects on top of the variational/mean- 
field calculation. 

As an illustration, the present method is applied to 
an Sn2 reaction in water fSec. IIIip . Free energy profiles 
are obtained by integrating the free energy gradient and 
they are compared with free energy perturbation (FEP) 
results. Non-mean-field effects are also evaluated using 
a Gaussian fluctuation model for the environment. The 
obtained results suggest that the non-mean-field effects 
are rather small for the present reaction in water. 



II. METHODOLOGY 

A. The underlying QM/MM free energy 

We consider the following QM/MM free energy (or the 
potential of mean forces acting on QM atoms) 

A(R) = - ^In J dR+ exp{-pEiR,R+)}, (3) 

where R and R"*" are Cartesian coordinates of the QM 
and MM atoms, respectively, /3 = l/ksT is the reciprocal 
temperature, and -E(R, R^) is the total energy given by 

£;(R,R+) = £QM(R,VMM(R,R+)) + fMM(R,R+). (4) 

Here, ^qm is the electronic energy of the QM subsys- 
tem in the presence of an external electrostatic field 
(called the effective QM energy). In the standard elec- 
tronic embedding scheme, it is defined via the following 
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Schrodinger equation 



(5) 

(note that the prime symbol wiU be attached on vari- 
ables and functions of "dummy" nature). Hqm is the 
QM Hamiltonian in the gas phase, /5(x) is the QM charge 
density operator, 



nuc clc 
p(x) = ZaS{^ - Ra) - ^ <5(X - fO, 

a i 



(6) 



and f'(x) is an (arbitrary) external electrostatic field. In 
this paper we will utilize a discretized approximation to 
Eq. (O given by 

[^QM + ^')) = ^Qm(R, v')|*(R, v')), 

a 

where {Qa} are a set of "partial charge" operators as- 
sociated with QM atoms {Ra}, and v'^ = v'CRa)- The 
motivation for using Eq. ([7]) is that the external field 
can be parametrized by an A^-dimensional vector v' — 
{v'l, . . . , v'pf), where N is the number of QM atoms. This 
fact makes the following discussion somewhat simpler. 
Nevertheless, we stress that there is no fundamental 
difhculty in using the original Schrodinger equation in 
Eq. (O; see Appendix |A] for such a formulation. In Ap- 
pendix |B1 we summarize the present definition of the 



partial charge operator Q — {Qi, ■ ■ ■ ,Qn) based on the 
electrostatic potential (ESP) fitting procedure.**^ 

Now going back to Eq. (g]), vmm(R, R^) = 
(I'MM,!, ■ • • , I'MM.Jv) are MM electrostatic potentials act- 
ing on QM atoms, 



VMM 



where 



wmm(x,R+) = Y 



1i 



MM 



x-R; 



(8) 



(9) 



with {qi } being partial charges of the MM atoms. 
£mm(R, R^) is the sum of the van der Waals interactions 
between QM-MM subsystems and the internal energy of 
the MM subsystem, 

^mm(R, R"*") = ^'qm/mm(I^' -f^^) + £'Mm(R^)- (10) 

In the following we will sometimes drop the arguments 
of VMM and ^mm for notational simplicity, i.e. vmm = 
vmm(R, R"*") and £mm — ^mm(R, R"*")- 



B. Variational approach for mean-field embedding 

The free energy in Eq. ^ may be rewritten as 



J 



A(R) = -^ln J dR+exp{-/3[(^'(R,VMM)|ffQM + Q ■ vmm|*(R,vmm)) +fMM]}, 

I 



(11) 



by explicitly writing i?qm(R, vmm) in terms of the 
QM wavefunction. Note that vmm always stands for 
vmm(R, R"*") as mentioned above. A direct evaluation of 
A(R) is computationally demanding because ^'(R, vmm) 
depends on R+ through vmm = vmm(R, R^)- To avoid 



repeated QM calculations, let us replace the true wave- 
function ^'(R, Vmm) by some trial one ^(R) that best 
approximates the true wavefunction in a statistically av- 
eraged sense. To do so, we consider a free energy func- 
tional of the form 



i[R,§] = --ln / dR+exp{-/3[(^'|i?QM + Q-VMM|^')+fMM]}. 



(12) 



Since the following inequality holds by definition for arbitrary vmm — vmm(R, R^) [we assume that 4'(R, vmm) is 
the ground state of Hqm + Q • vmm], 



(*(R,VMM)|i?QM + Q-VMM|*(R,VMM)) (*(R) |i7QM + Q • VMM | *(R)) , 

I 



(13) 



we obtain a variational principle for free energy 
A{-R) < i[R,^]. 



(14) 



Namely, j4[R, 5"] is a strict upper bound on A(R), and 
the best approximation to ^(R) is obtained by minimiz- 
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ing A[R, with respect to ^. This variational princi- 
ple is indeed a direct QM/MM analog of the standard 
ones used in conventional solvation theories.— i^"^ By 
minimizing the following Lagrangian to account for the 
normalization of ^t, 

L[R,vI/,A]=i[R,*]-A{('J'|*)-l}, (15) 
we obtain the following stationary condition for ^t: 



^QM + Q- <C VMM 



[*] 



l*)-A|^'). 



(16) 



Here ^ • ■ • ^ represents the statistical average over MM 
degrees of freedom, 

J (iR,+ e~'^[Q'''^"M+^M"l (• • ■ ) 

<C--->Q'= J ^p_+g_/3[Q'.VMM+£MM] ' ^^'^^ 

andQ[vE''] = (^''|Q|\1''). [In this paper the double bracket 
^ • • • ^ indicates that the average is of "classical" na- 



ture, i.e., it does not require repeated QM calculations.] 
Since Eq. (fT6|) is nonlinear with respect to ^, it is usually 
solved via iteration. It follows from comparison between 
Eqs. ([7]) and (fTBl) that ^ and A may be written as 



* ^'(R,v^'=), 
A = fQM(R, v^'^). 



(18a) 
(18b) 



where v*"^ is the self-consistent response field determined 

by 



v''=(R) = <vmm(R,R+) >Q- 
Q^'=(R) = (*^'=|Q|^''''=}, 



(19a) 
(19b) 



with ^""^ = ^'(R,v^'=). In this paper, Eq. ^ will be 
called the self-consistent (embedding) condition. The 
minimum value of the free energy functional is obtained 
by inserting * = "^^^ into Eq. (fH]) : 



J 



mini[R,*] = -iln /dR+exp{-/3[(^'^^|7?QM + Q-VMM|*'')+fMM]} 



A^^{R), (20) 



where A^Imm is defined by 

AylMM(R.,Q') = --In / dR+e-'^['^'-^™+^™l. (21) 
P J 

Note that (\I'*''^|iJQM|^''''^) has been extracted from the 
integral over R+ since it is independent of R"*". By the 
last hne of Eq. we define the QM/MM free energy 
with mean- field embedding approximation, ^'^^(R). 

The analytical gradient of A'^^(R) can be obtained 
using a standard procedure as follows. First, we rewrite 
the A'^^(R) in terms of the Lagrangian, 



(22) 



with A^'^ = £qm(R, v'''^). RecaUing that L[R, A] is sta- 
tionary with respect to ^E" and A, we obtain 



5R 



A^^^^(R) 



dL\'R,^,X\ 



dR 



(23) 



where the R derivative in the right-hand side does not 
act on ^ nor A. We then obtain the analytical gradient 
in Eq. ([33]), which will be discussed in the next section. 



C. Perturbative approach for mean-field 
embedding 



^'^^(R) in Eq. (gOl) can also be obtained from the 
exact ^(R) in Eq. ([3]) by Taylor expanding the effective 
QM energy £qm(R, vmm) up to first order: 



^qm(R, vmm) — ^qm(R, v' 
5£qm(R,v') 



(VMM-V°). (24) 



Here, v° = v°(R) is an arbitrary reference potential that 
is assumed to be independent of R+. Using the following 
Hellman-Feynman theorem for ^qm, 



9gQM(R,vO 
dv' 



= (vl/(R,v')|Q|*(R,v')) 



= Q(R,v'), 

and introducing the internal QM energy as 

i?QM(R,v') = (vI/(R,v')|^qm|*(R,v')), 

or alternately via 

fQM(R, v') = EqmCEI, v') + Q(R, v') • v'. 



(25) 



(26) 



(27) 
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Eq. ([24]) may be rewritten as 
fQM(R, vmm) ^ £^qm(R, v°) + Q(R, v°) • VMM. (28) 

(see Appendix [e1 for cases where the Hellmann-Feynman 
theorem does not hold). Inserting the above expansion 
of £qm into the exact A(R) and extracting £'qm(R, v°) 
from the integral over R+, we obtain 



A(R) ~ Ec^uiR, v°) + AAmm(R, Q°) 



(29) 



with Q° — Q(R, v°). The above equation is very sim- 
ilar to A'^^(R) in Eq. ([201), and indeed, the latter can 
be recovered simply by setting v° to the self-consistent 
potential v'^'^: 



AMF(R) = £;qm(R, v^^) + AAmm(R, 



(30) 



Therefore, the variational principle in Sec. Ill Bl is essen- 
tially equivalent with the first-order expansion of the 
effective QM energy about the self-consistent reference 
field. 



The analytical gradient of A'^^(R) can be obtained by 
first writing A'^^(R) in terms of the effective QM energy 
as 



A^'^'iR) = fQM(R,v^^(R)) - Q^=(R) • v^^(R) 
+AAmm(R,Q^=(R)) 



(31) 



[cf. Eq. ([27]) ]. taking the R derivative of the right-hand 
side, and using the following symmetric relations: 



afQM(R,v') 



9A^mm(R,Q') 



dQ' 



Q^^(R), 



v"^(R) 



(32a) 



(32b) 



Q'=Q- 



It then follows that the derivatives of Q'''^(R) and v'''^(R) 
cancel with each other and we are left with the following: 



J 



95qm(R,v') 



9AAmm(R,Q') 



dR 



Q'=Q= 



(33) 



This is our working equation for the gradient of QM/MM 
free energy with mean-field embedding [see also Eqs. (jA9p 
and (|D4[) for related equations]. The first term is the 
partial R derivative of the effective QM energy {not of 
internal QM energy ^qm), which may be written using 
the Hcllman-Feynman theorem as 



a£QM(R,v') 



dRa 



QM 



dRa 



(34) 

The second term in Eq. ([33|) is the partial R derivative of 
the (classical) solvation free energy, which may be written 
using Eq. ((2T|) as 



9AAmm(R, Q') 



dRa 



. dv 



MM,. 



d£, 



MM 



dRa dRa 



(35) 

An alternative way to obtain the free energy gradient 
in Eq. ([55)) is as follows (see also Appendix [Cjl. First we 
define the mean-field approximation to the total energy 
E{R, R+) and the QM/MM free energy ^(R) as 

i?^F(R, R+) = fQM(R, V-) + Q- . (VMM - V-) + £mm, 

(36) 

and 

^MF/j^) = -lln / dR+exp{-/3i;^^(R,R+)}. (37) 
P J 



The gradient of A'^^(R) then becomes 



as^'l*(R,R^ 



dR 

a£;MF(R,R^ 



dR 



(38) 



Inserting Eq. (j36p into the above equation and using the 
self-consistency condition in Eq. (fT9|) gives the free en- 
ergy gradient in Eq. ([55]) . We note that if the refer- 
ence potential v° is not set at the self-consistent one, i.e. 
v° ^ v'''^(R), the following term appears due to incom- 
plete cancellation among terms. 



dQ°{R) 
dR 



VMM 



(39) 



which requires the derivative of QM charges calculated in 
the reference potential, dQ°{R)/dR = 5Q(R, v°)/9R. 

We now compare the above perturbative approach with 
the QM/MM-MFEP40'41 and ASEP/MD methods.^O-^i 
The QM/MM-MFEP method develops a series of polar- 
izable QM models by Taylor expanding its energy and 
ESP charges up to first or second order. Their compar- 
ison with the present approach is made in Appendix [Cl 
From this comparison it follows that A^^(R) is essen- 
tially equivalent with Model 3 of the QM/MM-MFEP 
method with charge response kernel x neglected. The 
free energy gradient of Model 3 (with x neglected) looks 
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somewhat different from the present result at first sight. 
However, the former can be rewritten using the self- 
consistency condition as follows (see Appendix [C] for the 
notation) 



dAivQu) (9(*|i/off|*)° , /9fMM(rQM,rMM 



dr, 



QM 



dr, 



QM 



dr 



QM 



E 

(40) 

which is essentially equivalent with the present gradient 
expressions [Eqs. and ((D4)) ]. 

The above equation provides some rationale for the 
approximate gradient used by the ASEP/MD method 
[Eq. (121)]. To see this, let us rewrite Eq. (gU]) as 



dAjrQM) 
drQM 



d 



dr 



QM + Vq%mmI*°>, (41) 



QM 



with 



T/MF 
^QM/MM 



^^^P(x)"MM(x,rMM(T)) 

+ ^MM(rQM,rMM('r))]> (42) 



T = l 



which suggests that the gradient of ^(rqivi) may be 
viewed as the gradient of effective QM energy calculated 
in the averaged MM potential. Here it should be noted 
that the rqM-derivative above does not act on ^E"" nor 
r^]y[(T), no matter whether the self-consistency condi- 
tion is assumed or not (see Appendix [P] for details). 



D. Statistical fluctuations of the QM wavefunction 

As seen above, the variational/mean-field approach to- 
tally neglects statistical fluctuations of the QM wavefunc- 
tion about the mean-field state. The aim of this section 
is thus to discuss several ways for evaluating such non- 
mean- field effects on QM/MM free energy. 

First, let us separate the total energy into the mean- 
field and non- mean- field contributions as follows: 



£;(R, R+) = E^" (R, R+) + A£;(R, R+), 



(43) 



where £'"P(R,R+) is defined by Eq. ^ and 
A£'(R, R+) is the remaining part of the total energy. 
Using the definition of ^^(R, R+) in Eq. ([4]), the non- 
mean-field term can be written more explicitly as 

A_E(R, R+) = fQM(R, vmm) 

- £qm(R, v^^) - Q^^ • (VMM - v''^). (44) 

Inserting Eq. into the exact A(R) in Eq. ^ gives 

^(R) - A^^iB.) + AAfluc(R), (45) 

where 

AAfluc(R) = In < exp(-/3A£;) »qsc . (46) 



We note that up to this point Eqs. (|45|) and p6)) are still 
exact. The statistical average in Eq. can be evalu- 
ated rather rigorously as follows. First, one calculates a 
long trajectory of the MM subsystem using the sampling 
function exp(— /3i?'^^), selects a relatively small subset 
of MM configurations from the long trajectory (say, 500 
samples) , and calculates /S.E for those selected configura- 
tions in order to take the average of exp(— /3A£'). Indeed, 
this is a type of dual- level QM/MM sampling method, 
where exp(— /Ji?'^^) is used as a low-cost sampling func- 
tion while Ai? gives energetic corrections. 

Although the above dual-level method is rigorous, it 
requires hundreds of QM calculations and thus may be 
rather expensive. One approach for reducing the compu- 
tational cost is to truncate the expansion of effective QM 
energy at the second order 

Ai?~Ai?(2) ^i(vMM-v-)-XQM-(vMM-v-), (47) 

where Xqm defined by 



Xqm(R-,v') = 



a^gQM(R,vO 

9v'(9v' 



gQ(R,vO 

c»v' 



(48) 



with v' = v'^'^. Xqm is also called the charge response 
kernel due to the second equality in Eq. ((48|) .^^'^^ Once 
Xqu is obtained, the statistical average of exp(— /3Ai?(^)) 
can be evaluated with no extra QM calculations, thus 
significantly reducing the computational cost. The above 
second-order expansion is also utilized by Models 2 and 
3 of the QM/MM-MFEP method [see Eq. ([CSl in the 
present paper] in order to describe statistical fiuctuations 
of the QM wavefunction. '*"''*^ 

A further simplification can be made by introducing 
a Gaussian fiuctuation model for the MM environment. 
Specifically, we assume that the MM electrostatic poten- 
tial acting on QM atoms, vmm = vmm(R, R^), takes a 
(multi-dimensional) Gaussian distributioni ^^i'*^'^" 



< (5(v' - Vmm) >q=« 



oc exp 



1 / / sc\ 

-l^V^ - V ) -cr 



1 

MM 



(v — V 



Here, ctmm is the covariance matrix of vmm 

(sc \ / sc \ 7^ 

VMM — V j(,VMM — V ) 



(49) 



(50) 



By combining AE'^'^^ in Eq. (|47|) and the Gaussian fiuctu- 
ation model above, we obtain an approximate analytical 
expression for Ay4.fluc(R): 

AAfluc(R) =i ^ lndet[l + /5xqmO-mm]. (51) 

Note that since Xqm is negative definite;^ AS^^^ and 
A^fiuc(R) are always negative. The basic appeal of 
Eq. ([5T|) is that once the charge response kernel Xqm 
is obtained, Aylfluc(R-) can also be obtained simultane- 
ously by combining with ctmm that is available from the 
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mean-field calculation. Xqm '^^^ be evaluated most ef- 
ficiently by solving a coupled-perturbed Hartree-Fock or 
Kohn-Sham equation,— i^i^ or more primitively, by nu- 
merically differentiating the ESP charges Q(R, v') with 
respect to v' based on the second equahty in Eq. (|48)) . 

III. APPLICATION TO AN Sn2 REACTION IN 
WATER 

A. Background 

We now apply the above method to a Type-II Sn2 re- 
action in water (the Menshutkin reaction) 

NH3 + CH3C1-^NH3CH+ + Cr. (52) 

This reaction is known to exhibit greatly enhanced 
rates in polar solvents than in the gas phase due to 
strong electrostatic stabilization of the productsi^i^ 
This is in contrast to Type-I Sn2 reactions like 
Cr -I- CH3CI ^ CICH3 + Cr , which are decelerated by 
greater electrostatic stabilization of the reactant than of 
the transition state. Due to the great acceleration in rate, 
the Menshutkin reaction became the subject of many 
theoretical studies^^^^^^^iS^^^^^^^*^ 
Gao and Xia performed the first extensive QM/MM 
study using the AMI model;^ and demonstrated that 
the transition state in water is shifted remarkably toward 
the reactant region. Continuum solvent models were also 
applied to the same reaction at various levels of QM 
methods . ^^'^^1^° While those studies observed that con- 
tinuum models can provide free energetics similar to the 
QM/MM resultsp^ it was also argued that those models 
may not be appropriate for reaction (j52p due to the pres- 
ence of hydrogen bondsi^ Since then, several QM/MM(- 
type) calculations were performedr^'^i^i^i^i^'^i^ in- 
cluding the RISM-SCF method^ a mean-field QM/MM 
approach;^ and a dual-level method»i^ Overall, those 
calculations are in reasonable agreement with each other, 
predicting the free energy of activation AG^ to be 20 ~ 30 
kcal/mol and the free energy of reaction AGr to be —20 
~ —35 kcal/mol (both including solute entropic contri- 
butions). Among those studies, the present one is most 
similar in spirit to the mean- field QM/MM calculation 
by Aguilar and co-workers 

B. Computational details 

Following previous studies, we define the reaction co- 
ordinate as 

s(R) =r(C-Cl) -r(C-N). (53) 

The mean-field free energy yl^^(R) is minimized with re- 
spect to R under the constraint s(R) — s' . The resulting 
optimized geometry will be denoted as R*(s'). Our goal 
here is to obtain the free energy profile A'^^(R*(s')) as a 



function of s' . In this paper we constructed such a profile 
by integrating VA^^ along the optimized reaction path 
R*(s') [i.e., via thermodynamic integration (TI)]: 

A^iF(R*(s,))-AMF(R*(sJ) 

^iys'^^^-VA^^{R*{s% (54) 

where V = d/dH. We calculated R*(s') and 
V^^^(R*(s')) for equally spaced grid points, Sk = 0.2fc 
A (fc = 0, ±1,...), and evaluated the above integral 
via cubic spline interpolation. In practice, the follow- 
ing trapezoid rule was also sufficient for this small size of 
grid spacing: 

AMF(R*(sk))-^^^(R*(so)) 

[R*(sfc)-R*(sfc_i)] 

k=l,K 

X i [V^MF(R*(sfc)) + V^^F(R*(sfe_i))] . (55) 

The geometry optimization on A'^^(R) was performed 
by adapting the sequential sampling/optimization 
method by Yang and co-workers^for the present pur- 
pose. Since the free energy gradient in Eq. (j33|) assumes 
that the self-consistency (SC) condition in Eq. ([TO]) is 
satisfied, one might think that the latter must be solved 
for each step of the optimization. Then, the optimization 
procedure may appear the following: 

1. Given a geometry R*^"^ at an intermediate step n, 
solve the SC condition in Eq. p9)) for obtaining 
v**"^ and Q*"^, and evaluate the analytical gradient 

2. Advance R^") one step, e.g., as R("+i) := R(") - 
AVA'^F(jj^(n)). and 

3. Repeat steps 1 and 2 until a given convergence cri- 
terion is met, e.g., |VA^'^(R("))| < e. 

However, this scheme is somewhat too restrictive because 
the gradient does not have to be exact nor very accurate 
at the early stages of the optimization. What is needed is 
that the gradient becomes increasingly more accurate as 
the optimization proceeds. This observation leads to the 
following variant of the sequential sampling/optimization 
procedure, "^^which performs the statistical sampling of 
the MM environment and the optimization of the QM 
geometry in an iterative manner: 

1. Cycle 0: QM optimization. 

The QM subsystem is optimized in the gas phase 
to prepare the initial state. The resulting QM ge- 
ometry and partial charges are denoted as R*^°-' and 
Cl'-°\ No MM/MD simulation is performed at this 
cycle. 

2. Cycle n: (i) MM sampling. 

The QM geometry R("-i) and charges Q("-i) of 
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the previous cycle are embedded into the MM en- 
vironment. An MM/MD simulation is then per- 
formed to evaluate the averaged electrostatic po- 
tential and the gradient of AAmm in Eq- 



'"^ VMM ^R(>i-l)^Q(n-l), 



(56) 



q(ti) 



dAAuuiR, Q') 



dKa 



(57) 



R("-1),Q("-1) 



Since v(") and G^") = (Gr\ . . . , G^O are simple 
N- and 3A^-dimensional vectors {N is the number 
of QM atoms), they can be accumulated directly in 
the MD simulation. 

3. Cycle n: (ii) QM optimization. 

The QM geometry is optimized in the presence of 
v(") and G("). To this end, we employ the following 
target function for optimizing the QM geometry 



R: 



A(")(R) = £-qm(R,v(")) 

-l-^G(").(R, 



Ri 



The gradient of this target function is 



d 



dKa 



A(")(R) 



5gQM(R, V 

dRa 
afQM(R,v') 



in)] 



(58) 



(59) 



5AAmm(R, Q') 



R("-l),Q('i-l) 



We note that yl("^(R) gives a local approxima- 
tion to A^P(R) in that its gradient VA(")(R) ap- 
proximates the analytical gradient VA^^(R.) in 
Eq. ((33)) . By minimizing A'^"^(R) under the con- 
straint s(R) = s' [i.e., by eliminating the orthog- 
onal component of VA(")(R) to Vs(R)], we ob- 
tain a new QM geometry R^"' for the next cy- 
cle. In this paper the optimization of ^'^"^(R) 
was performed by adding linear external potentials 
Gq • (Rq — Ri"~^') and forces —Ga for individ- 
ual QM atoms in the GAMESS quantum chemistry 
packagers. 

By iterating over the above cycles, the QM geom- 
etry, ESP charges, and MM mean potentials con- 
verge to their asymptotic values, namely R^"^ « 
R("-i), Q(") « Q("-i), and v(") « ^i^-i) ^ rp^-g 
means that the SC condition is satisfied to a good 
accuracy. Since the QM geometry does not move 
any further, we may regard V^'"-* (R^"-*) as pro- 
viding a good approximation to VA'^^(R*) at the 
optimized geometry R* ~ R*^"^ . 



TABLE I: Lennard- Jones parameters for the solute molecule. 
Parameters of the CI atom are taken from Gao and Xia 
(Ref. IsgI ), and those of the other atoms are from the AM- 
BER94 force field (Ref.lzi). 



Atom 



a (A) 



e (kcal/mol) 



C 
N 

He 
Hn 
CI 



3.3996 
3.3409 
2.4713 
1.0691 
4.1964 



0.1094 
0.1700 
0.0157 
0.0157 
0.1119 



The above iterative optimization was used to obtain the 
free energy gradient in Eq. (|54p . 

The other computational details are as follows. The 
QM and MM calculations were performed using modi- 
fied versions of GAMESSl^ and DL_POLY packages.SS 
Following Truong et al.,^^ we used the BHHLYP/6- 
31-|-G(d,p) method in most calculations. Previous study 
shows that this method gives results similar to the 
MP4/aug-cc-pVDZ level for the present reaction.— The 
ESP charge operator and associated fitting grid were de- 
fined following Ten-no et al^ and Spackmani^i^S The 
charge response kernel Xqm ^^s calculated via finite dif- 
ference of the ESP charges Q(R, v') with respect to v'. 

Csj, symmetry was enforced on the QM subsys- 
tem. The optimization tolerance was set to 5 x 10"'' 
hartree/bohr, which is five times larger than the de- 
fault setting in GAMESS. Although v(") and G^") above 
should satisfy C^y symmetry in principle, they do not in 
practice due to statistical errors. These errors gener- 
ate small artificial components of overall translation and 
rotation. We thus removed those components manually 
such that the optimization could be completed to a given 
tolerance. 

MD calculations were performed by solvating one so- 
lute molecule into 253 water molecules (with the TIP3P 
potcntial)^^ in a cubic box of side length 19.7 A at T 
~ 300 K. Periodic boundary condition was applied, and 
electrostatic potentials were calculated using the Ewald 
method. The Lennard- Jones parameters are listed in Ta- 
ble m The timestep for integration was 2 fs. One it- 
erative optimization cycle consisted of 50000 MD steps 
for equilibration and 300000 steps for production. Al- 
though 100000 production steps were sufficient for ob- 
taining a similar result, we did not attempt to minimize 
the computational efforts. Rather, we aimed at obtaining 
a highly converged result, such that the statistical errors 
become comparable to the width of the plotting line. 



C. Free energy profiles 



To illustrate the above optimization procedure. Fig. [T] 
displays the z component of the approximate free energy 
gradient VA(")(R(")) in Eq. §^ as a function of iter- 
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FIG. 1: The z component of approximate free energy gradient 
VA'") (in kcal/mol/A) at s = 0.0 A as a function of iterative 
optimization cycle n. 
60 




2 3 4 5 6 
Optimization cycle 



FIG. 2: The z components of free energy gradient 
VA^''(R*(s)) (in kcal/mol/A) as a function of reaction co- 
ordinate s (in A). The arrow indicates the location of the 
transition state in solution. 




-1 1 

Reaction coordinate (Angs) 



ative optimization cycle n. (Here the solute molecule 
was kept oriented in the z direction of the simulation 
box, so only the z component of the gradient is nonva- 
nishing.) As seen, the approximate gradients converge 
monotonically to their asymptotic values as cycle n pro- 
ceeds. Other quantities like Q*^"^ and v'"^ exhibited a 
similar convergence behavior. We thus expect that the 
self-consistency is achieved to a good accuracy in the last 
few cycles of the iterative procedure. In this paper we 
used 8 cycles for each value of the reaction coordinate s. 
Figure [5] plots the gradients thus obtained as a function 
of s. 

By integrating the gradients in Fig. [21 we obtain a free 
energy profile J^^{s) = J^^{K*{s)) in Fig. [3] (solid line 
with circles). The barrier top of A^^(s) is located at = 
-0.05 A, which corresponds to (C - N) = 2.215 A and 
r^(G — CI) = 2.165 A. The free energy of activation and 
of reaction are defined here as 



MP 



(s = 2.0) 



FIG. 3: Free energy profile P^^^ in Eq. ((30)1 at the 
BIIIILYP/6-3H-G(d,p) level without solute entropic contri- 
butions (solid line with circles). The solid line and circles 
are obtained with Eqs. (|54|l and (|55p . respectively. -Eqm, 
-E'QM(gas), and AtImm represent the internal QM energy in 
Eq. (|26p . its gas-phase counterpart, and the (relative) solva- 
tion free energy in Eq. (I21|l . respectively. All the profiles are 
depicted such that they coincide at s = —1.6 A. 
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Reaction coordinate (Angs) 



which are found to be A^-f — 10.6 kcal/mol and 
A^lr = -38.7 kcal/mol at the BHHLYP/6-31-hG(d,p) 



59,60 



we 



^1.6), 



level. By adding solute entropic contributions 
obtain AG* = A^* 13.1 = 23.7 kcal/mol, which is 
in good agreement with AG''' =25.6 kcal/mol obtained 
by Aguilar et al. at the BHHLYP/aug-cc-pVDZ level,^ 
On the other hand, the reaction free energy is AGr = 
AA^ + 7.5 = —31.2 kcal/mol, which falls within the error 
bar of the experimental result, —34 ± 10 kcal/molj^ 

Figure [3] also illustrates how the internal QM en- 
ergy Eqm(R., 'v^^) and the (relative) solvation free energy 
A^mmIR, Q*''^) vary as functions of s. The gas-phase 
counterpart of the former [i.e., i?QM(R, v' = 0) along 
the gas-phase optimized path] is also plotted. To facili- 
tate the comparison, all the profiles are shifted vertically 
such that they coincide at s — —1.6 A. Figure [3] shows 
that A'^^ is determined by strong cancellation between 
Eqm and A^Imm- While the QM electronic energy in- 
creases steeply with the separation of the ion pair, this 
is more than compensated by strong electrostatic stabi- 
lization by the solvent. Figures [3] and [5] illustrate how 
the QM fragment charges and MM mean potentials vary 
as functions of s. The optimized reaction paths in the 
gas phase and in solution are compared in Fig. [51 As 
stressed previously)^ the transition state in solution [i.e., 
the saddle point of j4'^^(R)] is shifted remarkably toward 
the reactant region. This indicates that for the present 
charge separation reaction, the transition state in the gas 
phase should not be used for calculating the activation 
free energy in solution. 

To check the validity of the free energy gradient, sep- 
arate free-energy perturbation (FEP) calculations were 
also performed. Free energy differences between neigh- 
boring points of Sk (corresponding to circles in Fig. [3]) 
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FIG. 4: Fragment charges for CH3, NH3, and CI atom in 
solution (solid lines) and in the gas phase (dashed lines). 




Reaction coordinate (Angs) 



FIG. 5: Mean electrostatic potentials v'''^ from the MM en- 
vironment. 
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were calculated as 

= In <C exp{-/3AEk+i,k) >fc, (60) 

where = R*(sfc), AEk+i,k = S^P(R^+i,R+) - 
£:^P(R^,R+) with E^^{Ii,Tl+) given in Eq. and 
^ • • ■ denotes the statistical average with the sam- 
pling function exp[— /3i?'^^(R^, R""")]. The necessary in- 
put like R^. was obtained from the TI calculation. Since 
FEP docs not utilize the gradient information, the com- 
parison of FEP profiles with TI ones offers a stringent test 
of consistency between yl'^^(R) and VA^^(R). Figure 
[7] shows that the FEP profiles thus obtained are in excel- 
lent agreement with the TI ones, indicating that the free 
energy gradient is calculated correctly. Although there 
are slight differences between the FEP and TI profiles 
in tlie product region, this may be due to electrostatic 
finite-size effects)^ because the agreement becomes bet- 
ter for a larger number of solvent molecules N = 997 
than iV = 253. 

Figure [5] compares the free energy profiles obtained 
at the HF, MP2, B3LYP, and BHHLYP levels with a 



FIG. 6: Reaction paths optimized in solution (solid line) 
and in the gas phase (dashed line). "TS(aq)" and "TS(gas)" 
represent the transition states corresponding to the top of the 
barrier of and i5QM(gas) in Fig. |3l respectively. 




1.5 2 2.5 3 

r(C-N) (Angs) 



FIG. 7: Free energy profiles obtained with thermodynamic 
integration (TI) and free energy perturbation (FEP). A'^ is 
the number of water solvent molecules. 

20| I 
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larger basis set 6-311+G(2d,2p). This figure shows that 
the MP 2 gives the highest value of the free energy bar- 
rier, AA^ = 15.5 kcal/mol, the B3LYP gives the low- 
est value, 9.2 kcal/mol, and the BHHLYP their interme- 
diate, 11.9 kcal/mol (without including solute entropic 
contributions). Table HH summarizes those values of AA^ 
and AAr obtained with various QM mctliods and ba- 
sis sets. If we assume that the BHHLYP gives the 
"best" energetics for the present reaction,— our main 
results are AG* = 11.9 -t- 13.1 = 25.0 kcal/mol and 
AGr = -37.1 -I- 7.5 = -29.6 kcal/mol (including solute 
entropic contributions). 

To estimate the non- mean- field effects on QM/MM free 
energy, Table HTTl lists the values of AAfluc evaluated using 
the Gaussian fluctuation model in Eq. ([5T|) . This table 
also gives the values of AE'auc defined by 

A£;fluc ^qm(R, vmm) >q- -fQM(R, v"'^), (61) 
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FIG. 8: Free energy profiles obtained at tlie HF, MP2, B3LYP, 
and BHHLYP levels with the 6-311+G(2d,2p) basis set. The 
MP2 gives the highest value of AA* , while the B3LYP gives 
the lowest. See Table|lT]for individual values of A A* and AAr. 
The profiles do not include solute entropic contributions. The 
RESP method is used for all calculations. 
20r 
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TABLE II: Free energy of activation AA* and reaction AAr 
(in kcal/mol) obtained at the HF, MP2, B3LYP, and BHH- 
LYP levels with the 6-31-|-G(d,p) basis. Values in parenthe- 
ses are obtained with the 6-31H-G(2d,2p) basis. To compare 
with the previous studies, one needs to add solute entropic 
contributions to AA* and Aylr such that AG* ~ A^* -I- 13.1 
and AGr ~ Aylr-f7.5 kcal/mol (Refs.lH andUO). The RESP 
method is used for all calculations unless otherwise noted. 



Method 



A^t 



AA, 



HF 
MP2 
B3LYP 
BHHLYP 

BHHLYP" 



12.0 (13.7) 
16.8 (15.5) 
7.8 (9.2) 
10.6 (11.9) 
10.6 (11.9) 



-41.6 (-40.4) 
-35.1 (-34.6) 
-36.9 (-33.9) 
-38.7 (-37.1) 
-38.7 (-34.0) 



"RESP method not used. 



which was also calculated using the Gaussian model as 



1 



tr[XQM'''MM] 



(62) 



This quantity was used previously by Naka et al^ and 
Aguilar et al^ in order to study fluctuations of the QM 
wavefunction in solution. The table shows that the abso- 
lute values of A^auc and Ai^fluc are considerably small 
(< 0.5 kcal/mol) for the entire region of the reaction co- 
ordinate. They are also similar to the values reported for 
other organic molecules in water (AE'auc = —0.2 ^ —0.5 
kcal/mol). 3-'^'^ It should be noted that the impact of 
A^fiuclR) on free energy profiles is even smaller, be- 
cause the variation of A>lfiuc(R*(s)) as a function of s is 
on the order of 0.1 kcal/mol. This result suggests that 
the non- mean- field effects on QM/MM free energy are 
rather small for the present reaction in water. Similar 
observations have been made in the Iiteraturei^i2ii22ii2i 
Nevertheless, we stress that it is not clear at present to 



TABLE III: Fluctuation corrections for the free energy AAauc 
in Eq. (|5ip and for the interaction energy AEnuc in Eq. (|62p 
calculated with the BHHLYP/6-31-|-G(d,p) method. Values 
in parentheses are obtained with the 6-311-|-G(2d,2p) basis. 
RC stands for the reaction coordinate in Eq. (|53|l . All energies 
are given in kcal/mol. 



RC (A) 


AAauc 


A-Bhuc 


-1.6 


-0.43 (-0.44) 


-0.38 (-0.39) 


0.0 


-0.41 (-0.45) 


-0.36 (-0.40) 


2.0 


-0.30 (-0.34) 


-0.28 (-0.31) 



what extent this conclusion applies to different types of 
systems, e.g., enzyme reactions where local fluctuations 
of the MM environment may deviate significantly from 
the Gaussian distribution,^- 



IV. DISCUSSIONS AND CONCLUSIONS 

Numerical stability of the ESP charge operator. ESP 
charges and associated charge operator Q are some- 
times numerically unstable, as often stressed in the 
literature . ^^'^^1^^ For example, we observed an oscillatory 
behavior of partial charges within the CH3 group during 
the optimization cycles. This was typical for s = 0.8 
A, where the ion pair products start to form. Since 
these oscillations are partly due to ambiguous assign- 
ment of partial charges for "buried" atomSj^ the RESP 
method'*^ was of great help in suppressing those oscilla- 
tions. However, the RESP method was of little help in re- 
moving a divergent behavior of partial charges within the 
NH3 group observed in the reactant asymptotic region 
(s < —2.0 A). Specifically, the partial charge on the N 
(Hn) atom kept on growing in the negative (positive) di- 
rection during the optimization cycles. This might be due 
to inherent limitations of the present charge model, where 
partial charges are placed only on atomic nuclei and the 
lone pair on the N atom may be poorly described)^ In 
this respect, it may be more straightforward to use the 
continuous or mixed representation in Appendix [XI or iDl 
where one embeds the MM point char ges directly into 
the QM Hamiltonian. See Refs. [s^ and l4ll for this type 
of implementation. 

FEP that connects optimized geometries. If one is in- 
terested only in the free energy difference between two 
stationary points (e.g., activation free energy), it is prob- 
ably more efficient to use FEP than TI. Specifically, one 
first searches the free energy surface for stationary points 
by using the free energy gradient (and possibly the hes- 
sian), and then connects these points via FEP. The QM 
geometries and charges of intermediate points could be 
generated by linear interpolation of two end points. See 
Ref.ll^for such a calculation. In this way, one can reduce 
the number of costly free energy optimization. If one 
also needs to know a rough free energy profile, one could 
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perform additional optimization for a limited number of 
intermediate points and then connect them via FEP. 

Solute thermal/ entropic contributions. The method in 
this paper calculates the QM/MM free energy for a given 
fixed QM geometry. The thermal/entropic contributions 
of the QM subsystem thus need to be taken into account 
separately, e.g., via harmonic vibration approximations. 
This is a well-known limitation of the present type of 
method, which is also shared by conventional solvation 
theories. To overcome this limitation, several methods 
have been proposed for a priori including the solute flex- 
ibility into the QM/MM free energy calculation at a rea- 
sonable computational costii^i^ 

To conclude, we have presented a direct QM/MM ana- 
log of conventional solvation theories based on variational 
and perturbative frameworks. The main approximation 
in this paper is that the true QM wavefunction is re- 
placed by an averaged one that is calculated in the MM 
mean field. We stress however that the electrostatic in- 
teractions between the averaged QM wavefunction and 
the MM environment are calculated correctly without 
further approximations. The basic appeal of the mean- 
field QM/MM approach is that it can describe different 
environments (e.g., solutions and enzymes) on an equal 
theoretical footing, while the number of QM calculations 
can be made significantly smaller than a direct QM/MM 
calculation. 
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APPENDIX A: CONTINUOUS 
REPRESENTATION 



The main text is based on the approximate 
Schrodinger equation in Eq. ([7]), where QM/MM electro- 
static interactions are "discretized" in terms of the ESP 
charge operator. In this section we summarize an alterna- 
tive formulation using the continuous Schrodinger equa- 
tion in Eq. ([5]). 

First, the total energy is given by 



E{R,-R+) = fQM[R,i;MM] +£mm(R,R+), (A1) 



where £qm[R, i'mm] is defined via Eq. ^ with w'(x) — 
i'mm(x, R^). We then expand £qm[R, ^^mm] in terms of 
i'mm(x, R^) up to first order. 



fQM[R,fMM] ^ £QM[R,t''1+ y"dxp^^(x|R){t;MM(x,R+)-i;^^(x|R)} 



E^qmIR,^' 



(ixp"'=(x|R)wMM(x,R+), 
I 



(A2) 



where _Bqm [R, i''''^] = (^''^'^I^qmI^''"^) and we have used average defined by 
the following Hellmann-Feynman theorem: 



mil,v'M^mR,v']). (A3) 



p'''^(x|R) and t;'''^(x|R) are obtained from the following 
self-consistency condition. 



i;"'=(x|R) =< t;MM(x,R+) >p=o, 



p-(x|R) = (vl/-|p(x)|vl/-). 



with ^^'^ = ^'[R, v^'^]. Note that v^'^ and p^'^ depend para- 
metrically on R via Eq. (|A4() . ^ ■ • • 3> is the statistical 



/rfR+e-^g[p'-^-^^l(-- 

/dR+e-/3£[p'.R-.R-+l 



with 



(A5) 



£[p',R,R+]^ / dxp'(x)wMM(x,R+)+fMM(R,R+). 
(A4a) (Ag) 

Inserting the above first-order expansion into A{R) in 
(A4b) Eq- (El) gives the QM/MM free energy with mean- field 
embedding. 



A""'' (R) = i^QM [R, + A Amm [R, pn , ( A7) 
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with 



APPENDIX B: ESP CHARGE OPERATOR 



A^mm[R,p'] = -^In/ rfR+e-^^[^''^^^"l. (A8) 



The gradient of A'^^(R) can be obtained via similar ar- 
guments as 



9A^mm[R, p'] 



9R 



(A9) 

The first term represents the energy gradient in a fixed 
external field. The second term may be rewritten using 
Eq. jMI as 



5AAmm[R, p'] 



aR 



d^MMC^i R^) 



(AlO) 

Note that the above equation lacks the electrostatic term 
like Q^^ ^ dvMM,a/d'Ra 3> that is present in the the dis- 
cretized case [Eq. psp ]. This discrepancy originates from 
the different physical meaning of 5f qm [R, v'] / 9R (in the 
continuous representation) and 9£qm(R, v')/9R (in the 
discretized representation). In the discretized case, the 
external potential values v' acting on QM atoms are kept 
constant while varying the nuclear coordinates R. In the 
continuous case, the external potential field w'(x) is kept 
constant while varying R. This means that the potential 
values acting on QM atoms, z;'(Ra), may vary as a func- 
tion of R. The situation becomes clear by considering 
the following relation: 



9fQM[R, v'] 



9Ra 



d 



fQM(R,{^'^^(Rh|X)}) 



d-Ra 

afQM(R,v') 



9Ra 



X=R 

(All) 



^sc.RN ^ 9^MM(Ra,R+) 



where we have used the following identity obtained from 
Eq. (|A4aP : 



9w='=(R„|X) 



au"=(x|R) 



X=R 



9j 



(A12) 



9i;MM(x, R+) 



x=R„ 



Therefore, it follows that the missing electrostatic term 
in Eq. (jAlOp is now accounted for by the energy gradient 
term in Eq. (|A9p . 



The ESP charges {Qa] for a given wavefunction are 
obtained by minimizing the following function^ 



2;rid 



|G;-x| 

Qa — Qu 



E 



G/ — RqI 



(Bl) 



where {G;} are the ESP fitting grid and Qtot is the total 
charge. By requiring that dL{Q, \)/dQ = 0, inverting 
the resulting linear equations for Q, and determining A 
via J2a Qa = Qtot, we obtain 



Qa^{^\Qa\^), 



(B2) 



where Qa is an explicit function of {rj}, {Ra}, {G^}, and 
Qtot, with {ri} being the electron coordinates. See previ- 
ous work for the explicit form of Qa in the atomic orbital 
basis i^i^i^i^i^ The above definition of Q suggests that 
one may make the following replacement 



c?x- 



/5(x) 



Rn 



(B3) 



as long as y is located outside the core region of the QM 
charge density. Then, the continuous QM/MM electro- 
static interaction may be discretized as 

/ dX/5(x)i;MM(x, R+) ~ ^ QaVMM{'R.a,R'^), (B4) 

by inserting the definition of wmm(x, R^) in Eq. This 
is the present rationale for using Q in Eq. ([7]). 



APPENDIX C: COMPARISON WITH THE 
QM/MM-MFEP METHOD 

Here we compare the perturbative treatment of 
A'^''(R) in Sec. HTC] with the QM/MM-MFEP 
methodi^i^ The starting point is the same as Eq. ([3]) 
(here expressed using the notation in Ref. Al', ). 

^(i"qm) = -^In y drMMexp{-/3£;(rQM,rMM)}, (CI) 

where tqm — R, tmm — R^, and the total energy is 
given by 

£^(rQM,rMM) = (*|^^cff|*) +^^MM(rQM,rMM)- (C2) 

HcS is the effective QM Hamiltonian in the presence of 
the MM electrostatic field. 



-f^eff = -ffqM + / c;x/5(x)wmm(x, tmm), (C3) 
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and other quantities like Hqm are defined in the main 
text. We then introduce the MM mean field as 



1 



(C4) 



where {r^M(''')}T=i ^ set of MM configurations ob- 
tained from the previous cycle of the sequential sam- 
pling/optimization method4i The QM Hamiltonian in 
the presence of the MM mean field is defined by 



(C5) 



The eigenfunction and eigenenergy of H°g are denoted 

as |*°) and {^°\H°s\'^°) = (*|i?eff|*)°, and the ESP 
charges derived from \'i>°) are written as Q°(rQM)- The 
internal QM energy associated with H°fj is defined as 



E°{rQM) = (*|i^cff|*)° 



QM 

E 



(rQM)wMM(rQM,i); 



(C6) 

i.e., by subtracting the QM/MM electrostatic interaction 
energy expressed in terms of ESP charges from the effec- 
tive QM energy. 

The QM/MM-MFEP method then develops a series of 
polarizable QM models by Taylor expanding its energy 
and ESP charges up to first or second order. Among oth- 
ers, Model 3 ("QM point charges with polarization due 
to MM and QM atoms") approximates the total energy 
as follows [Eqs. (36) and (40) of Ref.'iH: 



where Xij is the charge response kernel in Eq. (|48)) . The 
above equation may be viewed as the second-order expan- 
sion of the effective QM energy in terms of MM electro- 
static potential [cf. Eq. (gT])]. The gradient of QM/MM 
free energy is obtained by inserting Eq. (jC8|) into the 
following. 



QM 



dr, 



QM 



(C8) 



or alternately, into an FEP-type expression [Eq. (6) of 
Ref . 411 



I 9-E(rQM,rMM) 
\ arc-' ^ 



■/3(i5(rQM,rMM)— -Erof (tmm)) 



g-/3(-E(rQM,rMM)--Brcf (rMM)) 



(C9) 

where £^rcf (i^mm) is the reference sampling function that 
is obtained from the previous cycle of the sequential sam- 
pling/optimization method4i 

The main difference of the present approach from the 
QM/MM-MFEP method is that the present one utihzes 
the self-consistency condition in order to simplify the gra- 
dient expression. To see this, let us insert £'(rQM,rMM) 
in Eq. (jC8|l into the statistical average in Eq. (jCSp . 



9A(rQM) ^ 9gi°(rQM) 
9r, 



QM 



9rQM 

QM 



E 



-£'(rQM,rMM) = £'J'(rQM) 

QM 

+ E Q^(''QM)^MM(rQM,z, TMm) 



(C7) 



gQ°(l-QM) 

9rQM 

QM 

E'3i(''QM) 



(wMM(rQM,i, TMm)) e 
9wMM(rQM,i, Tmm 



i9r, 



QM 



QM QM 



9 E E ["MM (r QM ,z , rMM ) - 1'MM(l"QM,i)] 



+ 



t^^MM(rQM) Tmm) 

9rQM 



(CIO) 



* 1 



XXii['f^MM(rQM,i, TMm) — 'f^MM(l"QM,i)] 
^MM(rQM, TMm), 



where terms depending on Xij have been neglected (they 
are treated separately in Sec. IIID| l. Using Eq. (|C6|) . we 
may rewrite the above equation as 



gA(rQM) di,^\U,^\^r , ^ gQ°(rQM) r/ , A\ o , u 

— (- > ^ 1\ i\«MMirQM,i,rMMj;i5 - t'MMll'QM.ij) 

OTqm OTqm CrQM 

,V^^o/ ^ //9i;MM(rQM,i,rMM) \ C^^^MM(l"QM,i) 1 , /c^^MM(rQM,rMM) \ |r^^^\ 

+l.o.(rQM)|^ — — 9?^r\ — — 



Now let us assume that the reference MM coordinates tion 
{'"mm (''')} satisfies the following self-consistency condi- 

1 ^ 

(/(rQM,rMM))_B =i -^Xl'^('"QM,rMM(^)), V/, (C12) 

T = l 
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which is expected to hold well for the last few 
cycles of the sequential sampling/optimization 
method^iThen, by setting / = WMM(rQM,i, tmm) 
or / = 9wMM(rQM,i, rMM)/9rQM, we have 



{VMM 



1 ^ 

(l"QM,i,I"MM)}£; - J^^VMM{lCQM,i,rMM{'T)) 



T = l 



^MM(l"QM,i): 



(C13a) 



9i;MM(rQM,i, Tmm) 



dr, 



QM 



1 9i;MM(rQM,i,rMM('^)) 



Ore 



_ t^^MM(l"QM,j) 



9rQ 



M 



(C13b) 



which suggest that the curly brackets in Eq. (jClip vanish, 
and as a result we obtain a simpler expression for the free 
energy gradient, 



aA(rQM) (9(*|iJoff|*)° , /9£:MM(rQM,rMM 



dr, 



QM 



9rQ 



M 



arc 



(C14) 

This form is found to be equivalent with the present gra- 
dient expressions, e.g., Eq. (|A9p . 



APPENDIX D: MIXED REPRESENTATION 

As seen from Eqs. jGeJ and dCS]), the QM/MM- 
MFEP method is based on a "mixed" representation 
of the QM/MM electrostatic interactions. That is, 
the QM wavefunction is calculated with the continuous 
Schrodinger equation in Eq. ([5]), while the internal QM 
energy etc are defined in terms of ESP charges. In this 
mixed representation, yl'^^(R) may be defined as 

a 

+AAmm(R,Q''=), (D1) 

and the mixed form of the self-consistency condition is 

?;"'=(x|R) = < wmm(x,R+) >Q.c, (D2a) 
Qr(R) = (^'^IQal*^^), (D2b) 



where W = *[R, The gradient of ^'^^(R) then 
becomes 



J 



A^P(R) 



9Ra 



^QM[R,'f'] 



5Ra 



E 



gQr(R) 

aRa 



{« WMM(Rfc,R+) »Q- -w''=(Rb|R)} 



,^sc,^^ ; a^MM(Ra,R+) a^;-(Ra|x) 



dxp'^^CxIR) 



9Ra 

aw'''^(x|R) 
aRa 



-^gr(R) 



9Ra 

av^'=(x|R) 



X=R 



9£mm(R, R^) 

« — 9r: — ^"^'^ 



aRa 



(D3) 



where p'''=(x|R) = (vl''''=|p(x)|^l''''=). The curly brackets in 
the above equation vanish by using Eq. (|D2a[) and its 
derivative with respect to x [see also Eq. (IA12P ]. The 
third line also vanishes approximately since Q*"^ represent 
the ESP charges that correspond to p^'^ . Therefore, we 
obtain the following gradient: 



d 



9Ra 



A^i^(R) 



£qm[R, v'\ 



9Ra 



<9i?mm(R, R^ 



5Ra 



(D4) 



APPENDIX E: GENERALIZATION TO 
NON- VARIATIONAL QM METHODS 

The main text assumes that the underlying QM wave- 
function is exact or calculated using QM methods with 
variational nature (e.g., Hartree-Fock and DFT). This 
means that the Hellmann-Feynman theorem holds and it 



can be used to define partial charges via Eq. (|25|) . How- 
ever, this is not the case for non- variational QM methods 
like the MP2 theory. In the latter case, one needs to gen- 
eralize the definition of partial charges as follows, 



Q(R,v') 



9gQM(R,vO 



(El) 



since the first derivative of effective QM energy plays the 
role of partial charges as described in Sec. Ill CI Accord- 
ingly, one needs to define the internal QM energy as 



^qm(R, v') = fQM(R, v') - Q(R, v') • v' 



(E2) 



With these definitions the discussion in Sec. Ill Cl remains 
valid. However, the actual calculation of generalized par- 
tial charges in Eq. (|E1[) may be tedious unless some ana- 
lytical algorithms are available. Fortunately, in the MP2 
method one can avoid such a calculation by discarding 
higher-order terms in correlation energy. ^^■''^ To see this, 
let us denote relevant quantities at the MP2 level as £'mp2 
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and Qmp2 ^^"^ ^he difference between the MP2 and 
HF levels as AQ — Qmp2 ~ Qhf Then, the mean- 
field free energy at the MP2 level may be written as 



iMP2 



= E 



MP2(vmP2 



— fMP2(vMp2) - Q 



AAmm(QmP2) (E3) 
AAmm(QmP2)- 



MP2 



'MP2 



Since O(A^) is of higher order in correlation energy^^^ 
it may safely be neglected at the MP2 level. The MP2 
correction for free energy is thus given by AfMP2, and we 
do not need to calculate Qmp9 nor v|Jjp2 explicitly. The 
AfMP2 can be evaluated using the standard expression 



By inserting Q^p2 = Q|j=f + and v^^jp2 = vffp + Av, 
and making the first-order expansion in terms of AQ and 
Av, we have 



A 



MP2 



fMP2(vHF) 
Ahf + AfMP2 



Q 



HF 



'HF 



HF) 



0(A2), 



■0(A2) 
(E4) 



MP2 



-E- 

^ , '-■0 
abrs 



\{ab\\rs)\'^ 



(E6) 



where 



AfMP2 = 'i^MP2(vHF) ~ ^Hf(Vhp). 



(E5) where {sa} etc are obtained with Hi 



QM 



Q 



'HF- 
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